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Abstract 

Using well-known results from statistical physics, concerning the almost-sure behavior of 
the free energy of directed polymers in a random medium, we prove that random tree codes 
achieve the distortion-rate function almost surely under a certain symmetry condition. 
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1 Introduction 



Tree source coding with a fidelity criterion has been studied since the late sixties and the early 
seventies of the previous century, see, e.g., [U Subsection 6. 2. 4], [6], [10], [12], [H], [15]. The first 
results, that were obtained by Jelinek and Anderson [15], were for tree coding of binary sources 
with the Hamming distortion measure, and by Dick, Berger and Jelinek [10] for Gaussian sources 
and the squared error distortion measure. Davis and Hellman [6j proved a tree coding theorem for 
a general memoryless source and a general fidelity criterion. In particular, they pointed out that 
in an earlier paper by Jelinek [1^, the proof of the coding theorem was valid only for symmetric 
sources, and so, by modifying the branching process associated with the tree code, they were able 
to relax the symmetry condition of the tree coding theorem. In this context, it should be pointed 
out that Gallager [12j also made a symmetry assumption in the same spirit. 

The main message in this short paper is, first of all, in the observation that the tree source coding 
problem is very intimately related to an important model in statistical physics of disordered systems, 
namely, the directed polymer in a random medium (DPRM), cf. e.g., [2] . [3] . [4] . [5] . [7] . [8] . [TT] . [TS] . [19] 
and references therein. Loosely speaking, in the DPRM, each configuration of the underlying 
physical system corresponds to a walk along consecutive bonds of a certain lattice, or a tree, where 
each such bond is assigned with an independent random variable (energy), and where the total 
energy (which is analogous to the distortion of the tree code) of this walk is the sum of energies 
along the bonds visited. For a given realization of these random energy variables, the probability 
of each walk is given by the Boltzmann distribution, namely, it is proportional to an exponential 
function of the negative total energy. The main challenge, as usual in equilibrium statistical physics, 
is to characterize the asymptotic normalized free energy of a typical realization of the system. For 
the case where the walks are defined on a tree (from the root to one of the leaves), this problem 
has a closed-form solution. 

This relationship between tree codes and the DPRM is interesting on its own right. It turns 
out to be so strong, that the various analysis techinquej^ and the results concerning the DPRM 
can readily be harnessed to the ensemble peformance analysis of tree codes. In particular, the 
distortion achieved by the best codeword in the tree codebook is identified with the free energy 
^These techniques are different from those of the papers mentioned in the first paragraph. 
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of the DPRM when the system is frozen (taken to zero temperature). This observation, does 
not merely provide an alternative proof of the tree coding theorem, but moreover, it enables to 
show that, at least under a certain symmetry assumption concerning the source and the distortion 
functiorj^, the distortion-rate function is achieved eventually almost surely (with respect to the 
randomness of the code) for every individual source sequence. This is different from (and stronger 
than) the previous findings, mentioned in the first paragraph above, which were coding theorems 
concerning the average distortion. 

The outline of this work is as follows: In Section 2, we establish our notation conventions and 
give a brief background in statistical mechanics in general and on the DPRM in particular. In 
Section 3, we show how the solution to the DPRM model can be used to prove that the tree code 
ensemble achieves distortion-rate function almost surely for every input. Finally, in Section 4, we 
provide a short summary of this paper. 

2 Notation Conventions and Background 
2.1 Notation Conventions 

Throughout this paper, scalar random variables (RV's) will be denoted by capital letters, like X 
and y, their sample values will be denoted by the respective lower case letters, and their alphabets 
will be denoted by the respective calligraphic letters. A similar convention will apply to random 
vectors and their sample values, which will be denoted with the same symbols in the boldface font. 
Thus, for example, X will denote a random n- vector (Xi, . . . , X„), and specific 
vector value in A'", the n-th Cartesian power of X. Sources and other probability measures that 
underly sequence generation will be denoted generically by the letters P and Q, and specific letter 
probabilities will be denoted by the corresponding lower case letters, e.g., p(x), ^(y), etc. The 
expectation operator will be denoted by Information theoretic quantities like entropies and 

mutual informations will be denoted following the usual conventions of the Information Theory 
literature. 

^This assumption is in the spirit of the above mentioned assumption by Gallager, though it is somewhat different. 
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2.2 Background 



Consider a physical system with n particles, which can be in a variety of microscopic states ('mi- 
crostates'), defined by combinations of physical quantities associated with these particles, e.g., po- 
sitions, momenta, angular momenta, spins, etc., of all n particles. For each such microstate of the 
system, which we shall designate by a vector x = {xi, . . . , there is an associated energy, given 
by an Hamiltonian (energy function), £{x). For example, if Xj = {Pi,ri), where is the momentum 

^ yy Hp 1 1 2 

vector of particle number i and rj is its position vector, then classically, £{x) = X]j=i[^^^ + "^5-^i]' 
where m is the mass of each particle, Zi is its height - one of the coordinates of r^, and g is the 
gravitation constant. 

One of the most fundamental results in statistical physics (based on the law of energy conser- 
vation and the basic postulate that all microstates of the same energy level are equiprobable) is 
that when the system is in thermal equilibrium with its environment, the probability of finding the 
system in a microstate x is given by the Boltzmann-Gihhs distribution 

= w 

where (3 = l/{kT), k being Boltzmann's contant and T being temperature, and Z{P) is the nor- 
malization constant, called the partition function, which is given by 



X 

or 



Z{P) = j dxe~^'^^\ 



depending on whether x is discrete or continuous. The role of the partition function is by far 
deeper than just being a normalization factor, as it is actually the key quantity from which many 
macroscopic physical quantities can be derived, for example, the free energjo is F{f3) = — In Z(/3), 
the average internal energy (i.e., the expectation of £{x) where x drawn is according ([1])) is given 
by £" = E{£{X)} = — (d/d/3) In Z{0), the heat capacity is obtained from the second derivative, etc. 
One of the ways to obtain eq. ([T]), is as the maximum entropy distribution under an average energy 



■^The free energy means the maximum work that the system can carry out in any process of fixed temperature. 
The maximum is obtained when the process is reversible (slow, quasi-static changes in the system). 
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constraint (owing to the second law of thermodynamics), where /? plays the role of a Lagrange 
multiplier that controls the average energy. 

Quite often, real-world physical systems of many particles, such as magnetic materials and 
solid-state devices, are subjected to effects of impurity (e.g., defects) that may appear as amorphic 
structures and disorder. To model such disorder, it is customary to let the Hamiltonian, £{x), 
depend also on certain random parameters and to examine the behavior of systems pertaining to 
typical realizations of these random parameters. There are many models of this kind in the physics 
literature. One of them is the DPRM, which is defined on a certain graph, such as a hypercubic 
lattice, or a tree. We henceforth focus on the latter and describe it more formally than in the 
Introduction. 

Consider a Cayley tree, namely, a full balanced tree with branching ratio d and depth n (cf. Fig. 
[H where d = 2 and n = 3). Let us index the branches by a pair of integers (i, j), where 1 <i < n 
describes the generation (with i = \ corresponding to the d branches that emanate from the root), 
and < j < — 1 enumerates the branches of the i-th generation, say, from left to right (see 
Fig. [1]). For each branch (i, j), 1 < j < d*, 1 < « < n, we randomly draw an independent random 
variable £ij according to a fixed probability function q{e) (i.e., a probability mass function in the 
discrete case, or probability density function in the continuous case). 




Figure 1: A Cayley tree with branching factor d = 2 and depth n = 3. 
A walk w, from the root of the tree to one of its leaves, is described by a finite sequence 
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{{hji)}i=i, where < ji < d - I and dji < jj+i < dji + d - 1, i = 1,2, {n - 1)13 For a given 
realization of the RV's {sij : i = 1,2, ... ,n, j = 0,1, . . . ,d^ — 1}, we define the Hamiltonian 
associated with w as £{w) = Y17=i then the partition function as: 

Zr,i(3) = J2^M-(3£{w)}. (2) 
w 

Of course, since {eij} are RV's, then so is Zn{P). The primary question addressed by physicists, 
in this context, concerns the (typical) behavior of the RV 

/„(/?) = ^ In Z„(/3) (3) 

for n large, which is (up to the minus sign), exactly the normalized free energy per step. It turns out 
(as proved e.g., in [3],[S]) that /„(/?) has a self-averaging property, in the terminology of physicists, 
in other words, the sequence of random variables {fn{P)}n>i converges in probability (and in fact, 
almost surely, as is shown in [3]) to a deterministic constant f{P), which is given by 

with the function being defined as 

^fa?)^ """-^;-'"" (5) 

where the expectation, which is assumed finite, is taken w.r.t. (/(e), and where Pc is the value of /3 
at which <j){P) is minimum, or equivalently, the solution to the equation </>'(/?) = 0, where cp' is the 
derivative of <j). 

As can be seen, /? = /3c is a point at which the asymptotic normalized free energy per step, /(/?), 
changes its behavior: Although /(/?) and its first derivative are continuous functions for all /3, the 
second derivative is discontinuous at /3 = /3c- In the terminology of physicists, this is referred to as a 
second order phase transition. Observe that while one might expect that the sequence fniP) would 
converge to the same limit as E{fn{P)} = E {In Zn{/3)}, i.e., the so called quenched average, the 
high temperature phase result <^ corresponds to ^ ln[£'{Z„(/3)}], which is called the annealed 
average. This means that Jensen's inequality is essentially tight at this range of /?. However, these 
two averages depart from each other at the low temperature phase, /3 > Pc- As can be observed. 



''in fact, for a given n, the number jn alone dictates the entire walk. 
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in this phase, the asymptotic normalized free energy no longer depends on /3, and it is referred to 
as the glassy phase or the frozen phase, which is characterized by zero thermodynamical entropy, 
in other words, the partition function is dominated by a sub~exponential number of configurations 
possessing the ground-state energy (cf. e.g., [l7l Chap. 5]). For reasons that will become apparent 
shortly, this frozen phase is the relevant phase for our source coding problem. 

The asymptotic free energy formula @ has been proved in the physics literature at least in four 
different ways: The first [3j is based on martingales, the second is based on non-integer moments 
of the partition function [8], [11], the third is based on a recursion of a certain generating function 
of the partition function as well as on traveling waves [Zj ) [S] , and the fourth method is the so-called 
replica method [7], which, although not rigorous, is very useful in statistical mechanics. 

3 Main Result 

We now turn to our lossy source coding problem, where some of the notation that will be used will 
be deliberately identical to that of Subsection 12.21 Consider a discrete memoryless source (DMS) 
P that generates symbols Xi,X2, ■ ■ ■ from a finitqj alphabet X . Let y denote a finite reproduction 
alphabet and let p : X ^ [0, oo) be a given distortion function. 

Consider next an ensemble of tree codes for encoding source n-tuples, x = (xi, . . . which 
is defined as follows: Given a coding rate R (in nats /source-symbol), which is assumed to be 
the natural logarithm of some positive integeio d, and given a probability distribution on the 
reproduction alphabet, Q = {q{y), y £ y}, let us draw d = independent copies of Y under Q, 
and denote them by ¥±,¥2, . . . , 1^. We shall refer to the randomly chosen set, Ci = {Yi, ¥2, . . . , 1^}, 
as our 'codebook' for the first source symbol, Xi. Next, for each 1 < ji < d, we randomly 
select another such codebook under Q, €2^1 = {^1,1,^1,2, • • • i^i.dli for the second symbol, X2. 
Then, for each 1 < ji < d and 1 < j2 ^ we again draw under Q yet another codebook 
^3ji,i2 = {^1^2,1' ^1^2,2) • • • ) ^ji,j2,d}^ for -^3; a-iid so on. In general, for each t < n, we randomly 



^ Finite alphabet assumptions are made mostly for simplicity. It is expected that our derivations continue to hold 
in the continuous case as well under suitable regularity conditions. 

®At first sight, it might appear that this gives a rather limited variety of coding rates to work with. Obviously, this 
can be improved by working with a superalphabet of (small) blocks, as was done in previous works on tree coding. 
But since the source alphabet could have been defined for these blocks in the first place, there is no essential loss of 
generality in this setting. 
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draw (f ^ codebooks under Q, which are indexed by {ji,j2, ■ ■ ■ ,jt~i), ^<jk1^d, l<k<t — 1. 

Once the above described random code selection process is complete, the resulting set of code- 
books {Ci,Ctjj^,.,jj_j, 2 < t < n, 1 < j'fc < d, 1 < A; < t — 1} is revealed to both the encoder and 
decoder, and the encoding-decoding system works as follows: 

• Encoding: Given a source n-tuple X", find a vector of indices (ji, J2> • • • > Jn) that minimizes 
the overall distortion Yl^=i p{^t,yji,...,jt)- Represent each component j* (based on by 
i? = Ind nats (that is, log2 d bits), thus a total of nR nats. 

• Decoding: At each time t {1 < t < n), after having decoded (j^, . . . output the repro- 
duction symbol Yj*^ j*. 

A few comments are in order at this point: First, as we see, the codebook generation process is 
branching hierarchically by a factor of d at each step, hence it is convenient to think of the code 
as having the structure of a Cayley tree, as in Subsection 12.21 The encoder seeks the best walk on 
that tree in the sense of minimum distortion. Note also that the process of converting the optimum 
walk w* = J2' • • • 'in) i'^to ^ compressed bitstream is extremely simple: We just convert each 
jt £ {1, . . . ,d} into its binary representation using log2 d bits without any attempt at compression. 
In other words, the entropy coding part is trivial in the sense that it uses neither the memory that 
may be present in the sequence {jl,j2, ■ ■ ■ ,jn)^ the possible skewdness of the distributions of 
these symbols. Finally, the decoding process is a purely sequential delayless process: At time t, the 
decoder outputs the t-th reproduction symbol. This is in contrast to the decoder of a general block 
code, which has to wait until the entire bit string of length nR has been received before it can start 
to decode. Thus, at least the decoding delay is saved this way. There is also a slight reduction in 
the search complexity at the encoder, due to the tree structure, but not a dramatic one. 

In order to analyze the rate-distortion performance of this ensemble of codes, using the results 
of Subsection 12.21 we now make the following assumption: 

The random coding distribution Q is such that the distribtion of the RV p{x, Y) is the same for all 
X e X. 

It turns out that this assumption is fulfilled quite often - it is the case whenever the random 
coding distribution together with distortion function exhibit a sufficiently high degree of symmetry. 
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For example, if Q is the uniform distribution over y and the rows of the distortion matrix {p{x, y)} 
are permutations of each other, which is in turn the case, for example, when X = y \s a. group 
and p{x, y) = j{x — y) is a difference distortion function w.r.t. the group difference operation. 
Somewhat more generally, this assumption still holds when the different rows of the distortion 
matrix are formed by permutations of each other subject to the following rule: p{x, y) can be 
swapped with p{x,y') provided that q{y') = q{y)- 

It should be pointed out that if the optimum random coding distribution Q* , namely, the one 
corresponding to the output of the test channel that achieves the rate-distortion function of X, 
happens to satisfy the above symmetry assumption, then as we show below (using a technique 
different from those of the earlier papers on tree coding), the rate-distortion performance of the 
above descrirbed code ensemble achieves the rate-distortion function. Moreover, this will turn out 
to be the case, not only in expectation, but also with probability one. 

We now turn to our analysis which makes heavy use of the results of Subsection 12.21 For a 
given X and a given realization of the set of codebooks, define the partition function in analogy to 
that of the DPRM: 

n 

Zn{f3) = 5]exp{-/3^p(xi,y,„...,,J}, (6) 

W t=l 

where the summation extends over all possible walks, w = (ji, . . . , jn)) along the Cayley tree, as 
defined in Subsection 12.21 Clearly, considering our symmetry assumption, this falls exactly under 
the umbrella of the DPRM, with the distortions y^'^_,,,jj)} playing the role of the branch 

energies {e^.j}- Therefore, ^ lnZ„(/?) converges almost surely, as n grows without bound, to 
now defined as 



where 



^^^^ A ln[d.£;{e-/^^(-'^)}] 



/5 




ln[e^ • E{e- 


-/3p(x,Y)|j 


P 




R + lii[E{e- 


-/3p(x,y)|j 





(8) 

where x is an arbitrary member of Af, which is immaterial by the symmetry assumption. Thus, for 
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every (xi,X2, • • •)) the distortion is given by 

1 " A 1 

limsup — > p(xt,Yj* ,*) = limsup — 



t=l 



mm 

n — *cxD ^ 

= lim sup lim sup 

< lim sup lim sup 
a.s. 



t=i 



In Zni(3i) 



-liminf 



(9) 



max 

/3>0 



Do(^), 



ln[£;{e-^''(^'^)}] + 



(10) 



where: (i) {/3f}^>i is an arbitrary sequence tending to infinity, (ii) the almost-sure equality is due 
to [21 Theorem 1], and (iii) the inequality at the third line is justified by the following chain: 



lim sup lim sup 



nt 



< lim sup lim sup 

?i-+oo £—foo 



lnexp{-/3£ X;r=i Pi^t, yji,...,j*)} 



Pin 



1 

n—Kxy "■ 

1 " 

lim sup lim sup — p{xt,Yj*^^^^j*] 



= lim sup lim sup 
< lim sup lim sup 
= lim sup < lim sup 

£— +00 t n-*oo 

= lim sup lim sup 



t=l 



lnexp{-/3£ Xir=i Pj^t, yji,...,jt)} 

Pin 



lnZ^(/?£ 

lnZn(/3<; 

Pin 



+ 



Ind 



fill 



We have shown then that the almost-sure distortion performance is uniformly given by Dq{R) for 
every individual source sequence xi,X2, ■ ■ ■■ Now, let us suppose that Q is chosen to be the output 
distribution Q* induced by the source P and the test channel X ^ y that achieves the rate- 
distortion function, and that the symmetry assumption continues to hold for Q* = {q*{y), y € y}. 
Then, we claim that Dq{R), defined with Q = Q*, coincides with the distortion-rate function of 
the source, D{R). 
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To see why this is true, recall that the rate-distortion function R{D) has the following repre- 
sentation (see, e.g., ^ p. 90, Corollary 4.2. 3], [20], [E]): 



R(D) = — minmin < BD + y p(x) In 
/3>0 Q I ^ ' 



yey 



(12) 



which, due to convexity in (3 and concavity in Q, is equaivalent to 



R{D) 



mm mm < 

Q /3>0 



+ ^ p{x) In 



yey 



■I3p{x,y) 



mm IfiD+Y^ p{x) In 



yey 



(13) 



and which, under the symmetry assumption, tells us that for every point {D, R) on the rate- 
distortion curve, we have: 



R = - min < BD + In 
/3>0 ' 



yey 



(14) 



Let P* achieve this minimum, i.e.. 



R = (5*D + In 



-I3*p{x,y) 



yey 



or, equivalently. 



Thus, clearly. 



In 



D{R) 



D(R) < max 

/3>0 



Eyey 1*(yy 



-P*p(x,y) 



+ R 



(3* 



In 



Eyey 



-Pp{x,y) 



+ R 



Do{R), 



(15) 



(16) 



(17) 



and so, it remains to show also the converse inequality, D{R) > Dq{R). To this end, observe that 
eq. implies that for every point {D, R) on the rate-distortion function: 



R<f3D + In 



'I3p{x,y) 



yey 



(18) 



holds for all /? > (with equality for (3 = (5*). Equivalently, for all /? > 0: 



In 



D > 



Eyey i*(y)^' 



-Pp{x,y) 



+ R 



(19) 
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and so, 



In 



D(R) > max 

I3>0 

thus proving that Do{R) = D{R). 



+ R 



Do{R), 



(20) 



4 Conclusion 



In this short paper, we tried to convey the following messages: (i) There is an intimate relationship 
between tree coding and the statistical physics of the DPRM, which we believe, is interesting, first 
of all, on its own right, (ii) The statistical mechanical approach provides an alternative way to 
prove the tree coding theorem, (iii) Existing results concerning the DPRM are harnessed right away 
to provide almost-sure convergence to the distortion-rate function of the source, thus strenghening 
the existing coding theorem, at least under a certain symmetry condition. 

It is speculated that the various statistical mechanical techniques that were exercised in the 
DPRM model (cf. last paragraph of Subsection [22]) and otherwise may shed more light on ensemble 
performance analysis on this and other information-theoretic settings of theoretical and practical 
interest. This research direction is currently pursued further. 
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